Collaborators: Tomo Tanaka, John Eykelenboom

Proposal

Shiny data explorers

Proposal

This is continuation of the previous Chromatin Compaction Modelling project. John is working on new data, but this time we have full 3D co-ordinates of each dot from the tracking software (which one?). The first step is to convert these co-ordinates into cell states, denoted by a colour.

Description by John

Raw data for cells is in form of Excel files. There are 3 files per cell

  • the whole track length [fr x - z] = frames x to z
  • up to NEBD [fr x - y]
  • after NEBD [fr y+1 – z]

In the case of the first file type there is no assignment of NEBD so these cells will not be automatically aligned according to our previous work. Maybe the second two file types are better; the two halves can be stitched together and then different cells aligned with each other according to the join (let me know what you think if this will work well).

When I tracked the dots the objects are organised into “tracks” that always refer to either red or green (not mixed). The colour for a given track could be determined by looking at the intensities in ch.1 (red) or ch.2 (green) for the objects of the track (and compared to the intensities for the same channel in the other tracks). I could also assign the colours on the excel sheet if this would be easier (e.g. manually make a new tab with the info). [information for channel 3 and 4 are simply masked versions of 1 and 2 that I use for tracking more easily].

The rules and colour coding we worked with in our previous study (distances are approx.):

  • Light Blue = 2 dots (not overlapping e.g. > 0.3 µm)
  • Dark blue = 3 dots (where dots of the same colour have distance < 0.75 µm)
  • Brown = 3 dots (where dots of the same colour have distance > 0.75 µm)
  • Pink = 4 dots (no restriction on distance)
  • Red = 4 dots (with 2 pairs of red/green overlapping e.g. < 0.3 µm)

Later addition:

  • Black = 2 dots (overlapping e.g. < 0.3 µm)

Colour identification

Objects are organised into “tracks”, that always refer to one colour, red or green. The software outputs intensities measured in two channels: red and green. Initially, I thought I could simply decide the colour of a dot based on which intensity is larger, red or green. However, some tracks have dots where green > red or red > green at different time points. See this example, made for cell_1. Tracks 1000000000 and 1000000101 have intensities on both sides of red=green line.

For now, we use manual annotation, that is a track with given ID has a colour assigned manually by John.

Intensity difference

Intensity in red and green channels can fluctuate and even a green dot can sometimes become a bit red. However, when we have two or more dots, it might be easier to assign colours by comparing them in each frame.

The figure below shows the intensity difference (green - red) for each dot at each time point for cell 1. The letters at the bottom indicate the state (L-light blue, B-dark blue, K-black, W-brown, P-pink, R-red). The digits show the number of points in the XYZ data. Bold font indicates missing points in intensity data.

Now we see that in most cases green dots have higher green-red difference than red dots, as expected. This holds true for frames around time -8 to -3 min, where red dots have positive green-red: green dots are always greener.

There are a few issues with this. They are marked with grey boxes. There are a few types of issues and I discuss them here, by looking at raw data.

-31 min

There is only one green dot in the figure. Here is some raw data from this frame.

Position
Position X Position Y Position Z Unit Category Collection Time TrackID ID
-1805.33 607.38 30.36 µ Spot Position 6 1000000047 52
-1805.52 607.47 30.52 µ Spot Position 6 1000000143 193
Intensity Median Ch=1 Img=1
Intensity Median Unit Category Channel Image Time TrackID ID
7487 NA Spot 1 Image 1 6 1000000047 52
Intensity Median Ch=2 Img=1
Intensity Median Unit Category Channel Image Time TrackID ID
12923 NA Spot 2 Image 1 6 1000000047 52

There are two dots in the Position sheet, but only one in intensity sheets, for red and green channels. Intensity data for track 1000000143 is missing in both channels.

-13 min

Here we have two red dots in the plot.

Position
Position X Position Y Position Z Unit Category Collection Time TrackID ID
-1803.68 604.43 29.02 µ Spot Position 24 1000000047 70
-1804.14 604.45 28.86 µ Spot Position 24 1000000143 172
-1803.38 603.94 29.23 µ Spot Position 24 1000000143 173
Intensity Median Ch=1 Img=1
Intensity Median Unit Category Channel Image Time TrackID ID
9083 NA Spot 1 Image 1 24 1000000143 172
8027 NA Spot 1 Image 1 24 1000000143 173
Intensity Median Ch=2 Img=1
Intensity Median Unit Category Channel Image Time TrackID ID
4949 NA Spot 2 Image 1 24 1000000143 172
7139 NA Spot 2 Image 1 24 1000000143 173

Just like above, intensity data from track 1000000047 is missing. There are two dots with measured position, byt only one of them has measured intensities.

+6 min

This is an interesting case.

Position
Position X Position Y Position Z Unit Category Collection Time TrackID ID
-1799.16 605.16 32.66 µ Spot Position 43 1000000000 6
-1798.56 605.17 32.73 µ Spot Position 43 1000000000 7
-1798.51 605.17 32.68 µ Spot Position 43 1000000088 98
-1799.02 605.13 32.23 µ Spot Position 43 1000000088 99
Intensity Median Ch=1 Img=1
Intensity Median Unit Category Channel Image Time TrackID ID
5578.0 NA Spot 1 Image 1 43 1000000000 6
5079.0 NA Spot 1 Image 1 43 1000000000 7
5079.0 NA Spot 1 Image 1 43 1000000088 98
4040.5 NA Spot 1 Image 1 43 1000000088 99
Intensity Median Ch=2 Img=1
Intensity Median Unit Category Channel Image Time TrackID ID
5877 NA Spot 2 Image 1 43 1000000000 6
8890 NA Spot 2 Image 1 43 1000000000 7
8890 NA Spot 2 Image 1 43 1000000088 98
6979 NA Spot 2 Image 1 43 1000000088 99

There are four dots in position and intensity, but intensities of dots in rows 2 and 3 are identical, in each channel. It looks like maybe a dot intensity was missing, but it was replaced with an intensity from another dot. As a result, we have a red dot with green intensity way too large than it should be.

State identification

I follow rules as outline in the proposal. They are fairly straightforward, except for the case of four dots.

From John:

In the cell each green dot is linked directly to one red dot (they are on the same sister chromosome) and likewise the same green dot is not linked to one red dot. Unfortunately we cannot say unambiguously which green should match with which red as we have no way to distinguish (or our analysis so far has not been so sophisticated). In our JCB paper (you can look at Figure S1C-F) I did some measurements in a small batch of data. As you can see when I was plotting distances between red and green dots (part E) I obtained all four possible distances and put them into the two possible combinations (a & b or c & d). For the analysis and plotting I just took the shortest combined pairs of distances between red and green dots – in the cartoon this is a and b (and discard distances c and d). For red pattern to be true, then a and b should both be less than 0.3-0.4µm (otherwise the pattern is deemed pink).

I follow this approach. I calculate distances for the two possible combination of red and green dots, and then select the combination with smaller mean distance. The state is deemed red when both distances in this combination are less than the limit.

This cartoon figure defines distances a, b, r and g.

Tracking cells

This figure demonstrates how state tracking works. The shapes indicate number of dots detected, the colour indicates the state. The horizontal dashed lines show the distance limits applied.

This is a more traditional look at the data:

What might be of interest, is the distribution of distances between dots. The next figure shows the aggregated distribution across all cells, for two, three and four dots detected. In case of four dots both distances are plotted, hence there are twice as many data points than images. Also, there are pink states below the limit (0.4 µm) - these are the cases where only one distance is below the limit.

Another plot showing all distances (a, b, r and g):

Red-red/green-green angle

Angle distribution between red-red and green-green vectors - only for cases with four dots.

Here is a timeline of angles, divided into windows of 10 min.

The next figure shows the relation between maximum of the distances a and b, and the angle.

The same figure, but split into time windows.

What if the large red-green angles at small a/b distances are a result of increased error when r/g distances are also small? This figure shows the red-green angle as a function of the smallest of the r, g distances. It only contains data below the red-pink limit (the vertical dashed line in figures above).

Indeed, most of the excessive angles appear when r or g are small. Perhaps, with brown-pink limit set to 0.5, these high angles can be attributed to brown.

Session info

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.3.4 forcats_0.5.1    stringr_1.4.0    dplyr_1.0.7      purrr_0.3.4     
##  [6] readr_2.0.0      tidyr_1.1.3      tibble_3.1.3     ggplot2_3.3.5    tidyverse_1.3.1 
## [11] targets_0.6.0   
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2          sass_0.4.0          jsonlite_1.7.2      viridisLite_0.4.0  
##  [5] modelr_0.1.8        bslib_0.2.5.1       RcppParallel_5.1.4  assertthat_0.2.1   
##  [9] highr_0.9           vipor_0.4.5         cellranger_1.1.0    yaml_2.2.1         
## [13] pillar_1.6.2        backports_1.2.1     glue_1.4.2          digest_0.6.27      
## [17] rvest_1.0.1         colorspace_2.0-2    stringfish_0.15.2   htmltools_0.5.1.1  
## [21] pkgconfig_2.0.3     broom_0.7.9         haven_2.4.2         scales_1.1.1       
## [25] webshot_0.5.2       processx_3.5.2      svglite_2.0.0       RApiSerialize_0.1.0
## [29] tzdb_0.1.2          generics_0.1.0      farver_2.1.0        ellipsis_0.3.2     
## [33] withr_2.4.2         cli_3.0.1           magrittr_2.0.1      crayon_1.4.1       
## [37] readxl_1.3.1        evaluate_0.14       ps_1.6.0            fs_1.5.0           
## [41] fansi_0.5.0         xml2_1.3.2          beeswarm_0.4.0      tools_4.1.0        
## [45] data.table_1.14.0   hms_1.1.0           lifecycle_1.0.0     munsell_0.5.0      
## [49] reprex_2.0.0        callr_3.7.0         compiler_4.1.0      jquerylib_0.1.4    
## [53] qs_0.25.1           systemfonts_1.0.2   rlang_0.4.11        grid_4.1.0         
## [57] rstudioapi_0.13     igraph_1.2.6        labeling_0.4.2      rmarkdown_2.9      
## [61] gtable_0.3.0        codetools_0.2-18    DBI_1.1.1           R6_2.5.0           
## [65] lubridate_1.7.10    knitr_1.33          utf8_1.2.2          stringi_1.7.3      
## [69] ggbeeswarm_0.6.0    Rcpp_1.0.7          vctrs_0.3.8         dbplyr_2.1.1       
## [73] tidyselect_1.1.1    xfun_0.24